Auxiliary Lexicon Word Prediction for Cross-Domain Word Segmentation
نویسندگان
چکیده
منابع مشابه
Learning Word Segmentation Rules for Tag Prediction
In our previous work we introduced a hybrid, GA&ILP-based approach for learning of stem-suffix segmentation rules from an unmarked list of words. Evaluation of the method was made difficult by the lack of word corpora annotated with their morphological segmentation. Here the hybrid approach is evaluated indirectly, on the task of tag prediction. A pair of stem-tag and suffix-tag lexicons is obt...
متن کاملProbabilistic Model for Segmentation Based Word Recognition with Lexicon
The problem of off-line reading of unconstrained handwritten words has been studied extensively due to its role in many important applications such as reading addresses on mail-pieces [3, 6, 11], reading amounts on bank checks [7, 10], extracting census data on forms [2, 9], and reading address blocks on tax forms [12]. The main challenges are wide variety of writing styles, poor image quality ...
متن کاملTibetan Unknown Word Identification from News Corpora for Supporting Lexicon-based Tibetan Word Segmentation
In Tibetan, as words are written consecutively without delimiters, finding unknown word boundary is difficult. This paper presents a hybrid approach for Tibetan unknown word identification for offline corpus processing. Firstly, Tibetan named entity is preprocessed based on natural annotation. Secondly, other Tibetan unknown words are extracted from word segmentation fragments using MTC, the co...
متن کاملDomain-specific Word Prediction for Augmentative Communication
Many augmentative communication systems employ word prediction to help minimize the number of user actions needed to construct messages. Statistical prediction techniques rely upon a database (model) of word frequencies and inter-word correlations derived from a large text corpus. One potential means to improve prediction is to create a set of models derived from domain-specific corpora, dynami...
متن کاملWord segmentation in Persian continuous speech using F0 contour
Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Natural Language Processing
سال: 2020
ISSN: 1340-7619,2185-8314
DOI: 10.5715/jnlp.27.573